Skip to content

Joint Germline subworkflow haplotypecaller -> Vqsr#595

Merged
FriederikeHanssen merged 90 commits intonf-core:devfrom
nickhsmith:vqsr
Jul 21, 2022
Merged

Joint Germline subworkflow haplotypecaller -> Vqsr#595
FriederikeHanssen merged 90 commits intonf-core:devfrom
nickhsmith:vqsr

Conversation

@nickhsmith
Copy link
Copy Markdown
Contributor

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@maxulysse
Copy link
Copy Markdown
Member

Duplicate of #546, but more advanced, so taking over

@nickhsmith nickhsmith changed the title Vqsr Joint Germline subworkflow haplotypecaller -> Vqsr Jun 20, 2022
Comment thread conf/igenomes.config Outdated
chr_dir = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/Chromosomes"
dbsnp = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf"
dbsnp_tbi = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf.idx"
dbsnp_vqsr = 'dbsnp,known=false,training=true,truth=false,prior=2 dbsnp_138.b37.vcf'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer having args in the modules.config, and avoiding adding extra files in igenomes.config

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't fit with the nf-core/module styling as this is expected to be an inputted value

Comment thread conf/igenomes.config Outdated
Comment on lines +42 to +52
// resources for GATK joint germline variant recalibration
RESOURCE_SNP = [
[ res_1000g, dbsnp ],
[ res_1000g, dbsnp_tbi ],
[ res_1000g_vqsr, dbsnp_vqsr ]
]
resource_INDEL = [
[ known_indels, dbsnp ],
[ known_indels_tbi, dbsnp_tbi ],
[ known_indels_mills_vqsr, known_indels_1000g_vqsr, dbsnp_vqsr ]
]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, but I feel like it should be done in the sarek script or in the joint germline variant calling workflow instead

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would then have to use less descriptive names as for hg19 and hg38 the files are slightly different. So the naming convention has to match regardless of the genome

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about something like known_snps (dbsnp should stay separate because tools like haplotypecaller explicetly want that file)

meta, gvcf, tbi ->
interval_name = meta.num_intervals > 1 ? (gvcf.simpleName - "${meta.id}_").replaceFirst("_",":") : meta.id
new_meta = [id: "joint_germline", interval_name: interval_name, num_intervals: meta.num_intervals]
interval_name = meta.num_intervals > 1 ? (gvcf.simpleName - "${meta.id}_").replaceFirst("_",":") : file(params.intervals).simpleName
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be very careful here. I am afraid this may lead to the weird resume errors/unmatching meta mpas we had before.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, would it be smarter to keep meta the same, and group py another (temporary) key?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is more the reqirval of the file name. I still can't explain this, but sometimes, when retrieving something from the file name like here, the name is incorrectly resolved later on (even though the matched file in the channel element is the correct one). In this case here this would lead to some very wrong results. (in the rest of sarek, since we group on patient ID it only lead to file name clashes, a) easy to find the bug, b) the actual output results were not impacted)

Comment thread conf/modules.config Outdated
Comment thread conf/modules.config Outdated
Comment thread conf/modules.config Outdated
Comment thread subworkflows/local/germline_variant_calling.nf Outdated
Comment thread conf/modules.config Outdated
ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline}
withName: 'GATK4_GENOMICSDBIMPORT' {
ext.prefix = { meta.num_intervals > 1 ? meta.intervals_name : "joint_interval" }
ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals}
ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals}

.map{ meta, cram, crai, intervals ->
[meta, cram, crai, intervals, []]

intervals_name = meta.num_intervals == 0 ? "no_interval" : intervals.simpleName
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we also need a conditional for joint germline. This addition of meta can only happen for haplotypecaller + joint germline or we need to rewrite a bunch of logic for the single sample case


//If no interval file provided (0) then add empty list
intervals_new = num_intervals == 0 ? [] : intervals
intervals_new = num_intervals == 0 ? [] : intervals
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
intervals_new = num_intervals == 0 ? [] : intervals
intervals_new = num_intervals == 0 ? [] : intervals

//Merge scatter/gather vcfs & index
//Rework meta for variantscalled.csv and annotation tools
MERGE_GENOTYPEGVCFS(vcfs_sorted_input.intervals.map{meta, vcf ->
[[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do the same nice formatting here as you did in line 116 ff?

[meta, cram, crai, intervals, []]

intervals_name = meta.num_intervals == 0 ? "no_interval" : intervals.simpleName
new_meta = [patient:meta.patient, sample:meta.sample, sex:meta.sex, status:meta.status, id:meta.sample, data_type:meta.data_type, num_intervals:meta.num_intervals, intervals_name:intervals_name]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new_meta = [patient:meta.patient, sample:meta.sample, sex:meta.sex, status:meta.status, id:meta.sample, data_type:meta.data_type, num_intervals:meta.num_intervals, intervals_name:intervals_name]
new_meta = [
data_type:meta.data_type,
id:meta.sample,
intervals_name:intervals_name,
num_intervals:meta.num_intervals,
patient:meta.patient,
sample:meta.sample,
sex:meta.sex,
status:meta.status
]

Comment thread subworkflows/nf-core/gatk4/joint_germline_variant_calling/main.nf Outdated

versions = ch_versions // channel: [ versions.yml ]
versions = ch_versions // channel: [ versions.yml ]
genotype_vcf = Channel.empty().mix(vcfs_sorted_input.no_intervals,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the vcfs_sorted_input.no_intervals also needs the variantcaller: "haplotypecaller" in it smeta map here to make sure annotation is placing it in the proper folder

//Merge scatter/gather vcfs & index
//Rework meta for variantscalled.csv and annotation tools
MERGE_GENOTYPEGVCFS(vcfs_sorted_input.intervals.map{meta, vcf ->
[[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf]
[[
id: "joint_variant_calling",
num_intervals: meta.num_intervals,
patient: "all_samples",
variantcaller: "haplotypecaller"
], vcf]

nickhsmith and others added 11 commits July 20, 2022 15:58
Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>
Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>
Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>
Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>
Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>
@FriederikeHanssen FriederikeHanssen merged commit 21b9f62 into nf-core:dev Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants